progect description

This project is dedicated to hypotheses prioritizing and A B testing. I need to:

We have 3 datasets: 1 for hypotheses Prioritization and 2 for A B testing:

Hypotheses Prioritization:

hypotheses dataset contains a number of hypotheses with brief descriptions and scaled (0-10) qualities like:

A B testing:

orders dataset contains information on every orders and on users who made it:

visits dataset contains information on the number of visits on the date specified in the A/B test group specified:

Table of Contents

Step 1. Downloading the data and its preprocessing

We will use 8 libraries:

downcasting

some downcasting to save space

comparing

number of orders made
number of visits made
difference

Double-group participants

Let's find whether we have users who belong to A and B at the same time and get rid of them

We found users who belong both to A and B and we better get rid of them for keeping the data clean

Lets also substract the amount of spoiled orders from the visits table

Conclusion

The datasets contain information on monthly visits ond orders on the site.
There are users who have been mistakenly assigned to both groups A and B groups. I got rid of them losing 181 rows of orders data from 58 users. Also I substracted those wrong users from the visits table. Now the data is clear, no duplicates found.

hypotheses:
We have 9 hypotheses to test with nice and understandable ratings.
A B:
Orders and visits datasets are prepared for A/B test analyse due to source separations they already have. It's just A/B test, not muliple test. There are both a little more visitors and a little more orders from group B. From the first glance, ratio orders per visits for B is ~0.4% bigger, that's not much, certainly not for this volumes of sales. Further information is yet to discover.

Step 2. Part 1.Prioritizing Hypotheses

ICE framework

RICE framework

Prioritization changes

Lets see whether 5 lead hypotheses for ICE and RICE are the same:

They are. How about top 3 leaders?

Conclusion:

ICE/RICE comparison plot

Step 3. Part 2. A/B Test Analysis

Cumulative revenue by group

Let's compare the revenue, one of the most speaking for itself indicator of success

Conclusions and conjectures

The B group brought 26439 more revenue than A group (A: 53212, B: 79651), leading from the second day of test till the end, with a great surge on the 2019-08-18, while group A kept the same pace, stably gaining revenue. Probably the B group introduced a different marketing methods or announced a very popular item that caught users attention or a nice discount on that day and that caused huge order flow, but that was a 1-day event, because later on B continued with the same pace. Would be interesting to find out what happened there. Shortly: the revenue margin happened not die to "bad" A group visitors flow, but due to something great in B group on the 2019-08-18 and general stronger B pace.

Cumulative average order size by group

Let's compare the average order situration

Conclusions and conjectures¶

Once again we see that group B leads. A and B met on the 2019-08-13 but due to the mentioned above B orders surge, groups separated and A ended up with 113 cumulative order while B with 145: the difference is 32. Looks like a D is definite leader, but maybe this happened due to huge 2019-08-18 outlier.

Relative difference graph for the average purchase sizes

Conclusions and conjectures¶

We can see that, starting low and finishing top, B group have interesting fluctuations. It has clear risings: on 2019-08-01, on 2019-08-05, on 2019-08-18, the date, that stands out again. It falls on 2019-08-08 and from and from 2019-08-11 to 2019-08-13, and from 2019-08-19. We see that B is nearly 27% more "successful" than A: their difference grow to 50%, but falls to 27% at the end. Some of the risings seem natural, but the 2019-08-17 looks like it is due to the outliers.

Each group's conversion rate daily

Conclusions and conjectures¶

Both mean and median difference between A and B conversion are less than 0.1% meaning that none of them can be called a leader. The graph is symmetrical after A and B meeting on 2019-08-06. Except for the spikes at the first day (which still bring no difference to mean and median). So even though A conversion from 2019-08-06 till the end was lower the B's, A can't be considered as looser.
Worth mentioning, that all the fluctuations are very little and happened on the distance from 2.4 to 3.3 for B and from 2.5 to 3.5 for A: so, the fluctuations are less than 1%. The difference between final conversion rate is just 0.4%: 2.9% for B and 2.5% for A

Scatter chart of the number of orders per user

Conclusions and conjectures

In general amount of purchases is 1, but we have 2 and 3 orders from 1 user. I would go for calling 3 an outlier: having 2 orders is rare, but it is a normal customer behaviour, we can't just cut them

95th and 99th percentiles for the number of orders per user

Conclusions and conjectures

Just 5% of users purchased more than 1 and only 1% bought more than 2. We can set 3 orders per user as an outlier (yes, 2 orders and above is the choise for 1% of users only, but I don't think that 2 orders is too much per user)

Scatter chart of order prices

A closer look to the values

Conclusions and conjectures

We have 1 enormous outlier of nearly 19920 per order and some other outliers that lay closer to average values. These must be one of the reasons for the data to be skewed. Also can see that 2 orders with highest price came from the B group and I would say, they boosted the revenue that happened from 2019-08-18 to 2019-08-19.
From the first glance, I would say that a price for on order can be considered an outlier if it is above 600, but with respect to the percentile (that can be found below), I would choose 450.

Percentile of order prices

Conclusions and conjectures

Only 5% of orders cost more than 435.5 and only 1% more than 900. Let's consider 450 a base value for finding outliers.

Statistical significance of the difference in conversion

hypotheses description

Here we are to trying to establish whether there is statistically significant difference in conversion between the groups or not. We work with the row data. Lets formulate our hypotheses:

Conclusions and conjectures

Using raw data we found statistically significant difference between conversion from A anb B and understood that A is less than B. Relative conversion gain of B is 15%, but lets see whether outliers have something to do with it or not.

Statistical significance of the difference in average order size

hypotheses description

Here we are to trying to establish whether there is statistically significant difference in the average order size per user to purchase between the groups. We work with the row data. Lets formulate our hypotheses:

Conclusions and conjectures

Raw data shows that there is no statistically significant difference in average order size and relative gain of group B is 27%, B is not surprisingly bigger. I expected raw data to show results that there is statistically significant difference, but apparently the outliers don't cause significant difference.

Statistical significance of the difference in conversion of filtered data

hypotheses description

Here we are to trying to establish whether there is statistically significant difference in conversion between the groups or not. We work with the filtered data. Lets formulate our hypotheses:

Conclusions and conjectures

After removing the outliers, we still found statistically significant difference in conversion. The relative gain of group B was 15%, and now it is 18.7%, A is less than B

Statistical significance of the difference in average order size for filtered data

hypotheses description

Here we are to trying to establish whether there is statistically significant difference in the average order size per user to purchase between the groups. We work with the filtered data. Lets formulate our hypotheses:

Conclusions and conjectures

There is still no difference in the average order size for filtered data, but we found the relative B to lower from 27% to -2%, so those huge values were really due to outliers.

Step 4. Decision based on the test results.

So, filtered data showed us, that anomaly-free B's conversion rate is 18.7% bigger than A's conversion rate while average order size shows no difference, and B even loses in 2%. Conversion difference is significant, but if all the buying conversion difference doesn't influence revenue, we didn't really reach anything special, the effect of a good conversion B is offset by the fact that the income from it is even less than the income from A. Therefore I'm stopping the test, and conclude that there is no statistically significant difference between the groups, their average revenue difference is not statistically significant and less than 2%.

Step 5. General Conclusion

In this project I had 3 datasets to work on. First of them contained information on several hypotheses with their quantities rated. To prioritize hypotheses that may help boost revenue I've implement ICE and RICE frameworks and came to different conclusions. ICE and RICE approach bring us same top 5 leaders, #2, #0, #6, #8, #7, but their top hypotheses were different. I used percentage share to decide which hypotheses to choose: RICE leader stand 28.57% aside from the closes hypothesis, and ICE have smaller difference of 17.72%. I've decided that since RICE also pays attention to the amount of users a hypothesis can affect and since the percentage margin for RICE was bigger, I've chosen the RICE leader hypothesis: "Add product recommendation blocks to the store's site. This will increase conversion and average purchase size".

Another 2 datasets were orders and visits that allowed me to launch an A/B test and analyze it's the results.

The results were really tricky because some data pointed out that we should choose the B hypothesis due to it's better performance : it's cumulative revenue was 26439 bigger, it's cumulative average order size was nearly 32 greater and it ended up with almost 27% greater bigger relative average order size difference to A, B's conversion rate was also bigger than A's (but just 0.4% bigger). The thing is that I noticed some clear surges in distributions and decided that they may have happened due to the outliers, so our data can be skewed and we shouldn't rely on it before we check it. I've decided to clear data from anomalies and to see if the groups are different due to them or not.
I used scatterplot and percentiles to establish what amount of orders and what order size should we consider lower base of anomaly data and came to conclusion to count 450 as anomaly big order size and 3 as anomaly big amount of orders.

I've conducted several statistical test to establish whether there is a statistically significant difference in conversion and in average purchase size between A and B. The tests were conducted both on raw data and on filtered data with no anomaly users to check the anomaly impact:
Conversion rate comparison both raw and filtered data showed that there is statistically significant difference between conversion rates having B bigger then A with conversion gain of 15% (raw) and 18% (filtered), so in terms of conversion B is a leader.
As for average order size, both raw and filtered data showed that there is NO statistically significant difference between average order size of A and B, but the relative gain of B has fallen from 27% to -2,the difference was due to outliers. So, even though conversion difference is significant, effect of a good B conversion is offset by B income that is even less than the income from A. Considering revenue a more important factor, I'm stopping the test and conclude that there is no statistically significant difference between the groups, I'm stopping the test and conclude that there is no statistically significant difference between the groups.